良好的善解人意对话系统应首先跟踪并理解用户的情绪,然后以适当的情感回复。但是,目前对此任务的方法要么集中于提高对用户情绪的理解或提出更好的反应策略,而且很少有作品同时考虑这两种工作。我们的工作试图填补这一空缺。受到任务导向对话系统的启发,我们提出了一种具有情感感知对话管理的新颖善解人意的响应生成模型。情绪感知对话管理包含两个部分:(1)情绪状态跟踪保持当前用户的情绪状态,(2)善解人意的对话策略选择预测目标情绪和用户的意图,基于情绪状态跟踪的结果。然后,预测信息用于指导响应的产生。实验结果表明,与自动评估和人类评估下的几个基准相比,动态管理不同的信息可以帮助模型产生更多的移情反应。
translated by 谷歌翻译
A crucial issue of current text generation models is that they often uncontrollably generate factually inconsistent text with respective of their inputs. Limited by the lack of annotated data, existing works in evaluating factual consistency directly transfer the reasoning ability of models trained on other data-rich upstream tasks like question answering (QA) and natural language inference (NLI) without any further adaptation. As a result, they perform poorly on the real generated text and are biased heavily by their single-source upstream tasks. To alleviate this problem, we propose a weakly supervised framework that aggregates multiple resources to train a precise and efficient factual metric, namely WeCheck. WeCheck first utilizes a generative model to accurately label a real generated sample by aggregating its weak labels, which are inferred from multiple resources. Then, we train the target metric model with the weak supervision while taking noises into consideration. Comprehensive experiments on a variety of tasks demonstrate the strong performance of WeCheck, which achieves a 3.4\% absolute improvement over previous state-of-the-art methods on TRUE benchmark on average.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Recent advances in operator learning theory have improved our knowledge about learning maps between infinite dimensional spaces. However, for large-scale engineering problems such as concurrent multiscale simulation for mechanical properties, the training cost for the current operator learning methods is very high. The article presents a thorough analysis on the mathematical underpinnings of the operator learning paradigm and proposes a kernel learning method that maps between function spaces. We first provide a survey of modern kernel and operator learning theory, as well as discuss recent results and open problems. From there, the article presents an algorithm to how we can analytically approximate the piecewise constant functions on R for operator learning. This implies the potential feasibility of success of neural operators on clustered functions. Finally, a k-means clustered domain on the basis of a mechanistic response is considered and the Lippmann-Schwinger equation for micro-mechanical homogenization is solved. The article briefly discusses the mathematics of previous kernel learning methods and some preliminary results with those methods. The proposed kernel operator learning method uses graph kernel networks to come up with a mechanistic reduced order method for multiscale homogenization.
translated by 谷歌翻译
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a policy is trained to maximize a learned value function while constrained by the behavior policy to avoid a significant distributional shift. In this paper, we propose our closed-form policy improvement operators. We make a novel observation that the behavior constraint naturally motivates the use of first-order Taylor approximation, leading to a linear approximation of the policy objective. Additionally, as practical datasets are usually collected by heterogeneous policies, we model the behavior policies as a Gaussian Mixture and overcome the induced optimization difficulties by leveraging the LogSumExp's lower bound and Jensen's Inequality, giving rise to a closed-form policy improvement operator. We instantiate offline RL algorithms with our novel policy improvement operators and empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
translated by 谷歌翻译
Universal Image Segmentation is not a new concept. Past attempts to unify image segmentation in the last decades include scene parsing, panoptic segmentation, and, more recently, new panoptic architectures. However, such panoptic architectures do not truly unify image segmentation because they need to be trained individually on the semantic, instance, or panoptic segmentation to achieve the best performance. Ideally, a truly universal framework should be trained only once and achieve SOTA performance across all three image segmentation tasks. To that end, we propose OneFormer, a universal image segmentation framework that unifies segmentation with a multi-task train-once design. We first propose a task-conditioned joint training strategy that enables training on ground truths of each domain (semantic, instance, and panoptic segmentation) within a single multi-task training process. Secondly, we introduce a task token to condition our model on the task at hand, making our model task-dynamic to support multi-task training and inference. Thirdly, we propose using a query-text contrastive loss during training to establish better inter-task and inter-class distinctions. Notably, our single OneFormer model outperforms specialized Mask2Former models across all three segmentation tasks on ADE20k, CityScapes, and COCO, despite the latter being trained on each of the three tasks individually with three times the resources. With new ConvNeXt and DiNAT backbones, we observe even more performance improvement. We believe OneFormer is a significant step towards making image segmentation more universal and accessible. To support further research, we open-source our code and models at https://github.com/SHI-Labs/OneFormer
translated by 谷歌翻译
Adaptive mesh refinement (AMR) is necessary for efficient finite element simulations of complex physical phenomenon, as it allocates limited computational budget based on the need for higher or lower resolution, which varies over space and time. We present a novel formulation of AMR as a fully-cooperative Markov game, in which each element is an independent agent who makes refinement and de-refinement choices based on local information. We design a novel deep multi-agent reinforcement learning (MARL) algorithm called Value Decomposition Graph Network (VDGN), which solves the two core challenges that AMR poses for MARL: posthumous credit assignment due to agent creation and deletion, and unstructured observations due to the diversity of mesh geometries. For the first time, we show that MARL enables anticipatory refinement of regions that will encounter complex features at future times, thereby unlocking entirely new regions of the error-cost objective landscape that are inaccessible by traditional methods based on local error estimators. Comprehensive experiments show that VDGN policies significantly outperform error threshold-based policies in global error and cost metrics. We show that learned policies generalize to test problems with physical features, mesh geometries, and longer simulation times that were not seen in training. We also extend VDGN with multi-objective optimization capabilities to find the Pareto front of the tradeoff between cost and error.
translated by 谷歌翻译
对移动障碍的检测和细分,以及对当地环境的未来占用状态的预测,对于自动驾驶汽车,必不可少的自动驾驶行动至关重要。在本文中,我们提出了一个框架,该框架使用深层神经网络体系结构将两个功能集成在一起。我们的方法首先检测到现场移动对象的段,并使用此信息来预测自动驾驶汽车周围环境的时空演化。为了解决静态动态对象分割和环境预测模型直接集成的问题,我们建议在整个框架中使用基于占用的环境表示。我们的方法在现实Waymo打开数据集上进行了验证,并证明了比基线方法更高的预测准确性。
translated by 谷歌翻译
与2D车道相比,实际3D车道数据很难准确收集。在本文中,我们提出了一种仅使用2D车道标签训练3D车道的新方法,称为弱监督的3D车道检测WS-3D车道。通过在相邻车道上的恒定车道宽度和相等高度的假设,我们间接监督训练中的3D车道高度。为了克服数据收集过程中相机音调动态变化的问题,提出了相机音调自校准方法。在锚固表示中,我们提出了一个具有改进的非限量抑制(NMS)方法的双层锚,该方法使基于锚的方法可以预测两条接近的车道线。实验是在两种监督方法下在3D-LANENEN的基础上进行的。在弱监督的环境下,我们的WS-3D车道的表现优于先前的3D-LANEN:APOLLO 3D合成数据集的F得分上升到92.3%,而F1在3DDLANES上上升到74.5%。同时,在纯监督环境中的WS-3D车道可以提高更多的增量,并且优于最先进的设置。据我们所知,WS-3D车道是在弱监督环境下进行3D车道检测的第一次尝试。
translated by 谷歌翻译
考虑到安全至关重要自动化系统中情境意识的功能,对驾驶场景的风险及其解释性的感知对于自主和合作驾驶特别重要。为了实现这一目标,本文提出了在驾驶场景中的共同风险定位的新研究方向及其作为自然语言描述的风险解释。由于缺乏标准基准,我们收集了一个大规模数据集,戏剧性(带有字幕模块的驾驶风险评估机制),该数据集由17,785个在日本东京收集的互动驾驶场景组成。我们的戏剧数据集适用于带有相关重要对象的驾驶风险的视频和对象级别的问题,以实现视觉字幕的目标,作为一种自由形式的语言描述,利用封闭式和开放式响应用于多层次问题,可以用来使用这些响应,可用于在驾驶场景中评估一系列视觉字幕功能。我们将这些数据提供给社区以进行进一步研究。使用戏剧,我们探索了在互动驾驶场景中的联合风险定位和字幕的多个方面。特别是,我们基准了各种多任务预测架构,并提供了关节风险定位和风险字幕的详细分析。数据集可在https://usa.honda-ri.com/drama上获得
translated by 谷歌翻译